# run setup script
#install.packages("remotes")
#library(remotes)
#remotes::install_github("clauswilke/dviz.supp")
#devtools::install_github("clauswilke/dviz.supp")

library(colorspace)
library(dplyr)
library(tidyverse)
library(ggforce)
library(ggridges)
library(treemapify)
library(forcats)
library(statebins)
library(sf)
library(cowplot)


options(digits = 3)
knitr::opts_chunk$set(
                 echo = FALSE,
              message = FALSE,
              warning = FALSE,
               cache = FALSE,
               #dpi = 105, # not sure why, but need to divide this by 2 to get 210 at 6in, 
                           # which is 300 at 4.2in
           fig.align = 'center',
           fig.width = 6,
             fig.asp = 0.618,  # 1 / phi
            fig.show = "hold"
            )
options(dplyr.print_min = 6, dplyr.print_max = 6)

1 Data Summary

We first read in two data sets called “income” and “life” representing income and life expectancy values throughout a multitude of years. “Income” has 193 observations with 220 total variables whilst “Life” has 187 observations and 220 total variables. Next, we reshape both data sets such that there are only three columns (Geo, Year, Income or Life Expectancy). We then merge these two new sets into a data set called “LifeExpIncom” which now contains Geo, Year, Income, & Life Expectancy (40953 observations and 4 variables). We then read in two more sets called “country” (240 observations and 11 variables) and “pop” (195 observations and 220 variables) respectively representing country and population data. We reshape the data set “pop” so that it coincides with “LifeExpIncom” and “Country” which already have the variable Year transformed into one column. After doing this, we’re able to merge “LifeExpIncom” with “Country” and then this newly merged set with our recently transformed “pop” set, creating a set called “fin_data” (42705 observations and 15 variables). After this, all that is left is to subset the data so that we only focus on data from the year 2000. This gives us our “final_data” (195 observations and 15 variables) set:

     geo                year             population       life.expectancy
 Length:195         Length:195         Min.   :7.85e+02   Min.   :44.1   
 Class :character   Class :character   1st Qu.:1.26e+06   1st Qu.:61.2   
 Mode  :character   Mode  :character   Median :6.01e+06   Median :70.5   
                                       Mean   :3.13e+07   Mean   :67.3   
                                       3rd Qu.:1.90e+07   3rd Qu.:74.7   
                                       Max.   :1.28e+09   Max.   :81.8   
                                                          NA's   :8      
     income         alpha.2            alpha.3           country.code
 Min.   :   529   Length:195         Length:195         Min.   :  4  
 1st Qu.:  2335   Class :character   Class :character   1st Qu.:209  
 Median :  6860   Mode  :character   Mode  :character   Median :418  
 Mean   : 13667                                         Mean   :425  
 3rd Qu.: 15700                                         3rd Qu.:643  
 Max.   :108000                                         Max.   :894  
 NA's   :8                                              NA's   :21   
  iso_3166.2           region           sub.region        intermediate.region
 Length:195         Length:195         Length:195         Length:195         
 Class :character   Class :character   Class :character   Class :character   
 Mode  :character   Mode  :character   Mode  :character   Mode  :character   
                                                                             
                                                                             
                                                                             
                                                                             
  region.code    sub.region.code intermediate.region.code
 Min.   :  2.0   Min.   : 15     Min.   : 5.0            
 1st Qu.:  2.0   1st Qu.: 54     1st Qu.:11.0            
 Median : 19.0   Median :154     Median :14.0            
 Mean   : 71.7   Mean   :178     Mean   :14.9            
 3rd Qu.:142.0   3rd Qu.:202     3rd Qu.:17.0            
 Max.   :150.0   Max.   :419     Max.   :29.0            
 NA's   :21      NA's   :21      NA's   :119             

2 GGPlot

The above scatter plot shows the relationship between income, life expectancy, and population size across different regions in the year 2000. Each point is a country and the size of the points correlate with the population size of that specific region. The countries are all color coded as well.

From looking at the plot, we can see that there is very slightly positive correlation between income and life expectancy. We can also tell that countries that have higher incomes will most likely have longer life expectancy. Eyeballing the plot, we can see that countries in the Americas and Asia definitely contain the higher population sizes. This also shows that countries with a higher population will most likely have a longer life expectancy as well. Those in Europe seem to have the longest life expectancy, with most of its points on the far right side of the graph - although their populations aren’t as large as other countries.

2.1 Subsetting Data for Year 2015

After this, all that is left is to subset the data so that we only focus on data from the year 2015. This gives us our “final_data” (195 observations and 15 variables) set. Now, let’s look at the overall summary statistics for the data set “fin_data” which contains not just data from 2015, but from all years from the data set.

     geo                year             population       life.expectancy
 Length:42705       Length:42705       Min.   :6.42e+02   Min.   : 1     
 Class :character   Class :character   1st Qu.:2.83e+05   1st Qu.:31     
 Mode  :character   Mode  :character   Median :1.71e+06   Median :36     
                                       Mean   :1.30e+07   Mean   :43     
                                       3rd Qu.:5.94e+06   3rd Qu.:56     
                                       Max.   :1.42e+09   Max.   :84     
                                                          NA's   :2268   
     income         alpha.2            alpha.3           country.code 
 Min.   :   247   Length:42705       Length:42705       Min.   :  4   
 1st Qu.:   875   Class :character   Class :character   1st Qu.:208   
 Median :  1440   Mode  :character   Mode  :character   Median :418   
 Mean   :  4591                                         Mean   :425   
 3rd Qu.:  3460                                         3rd Qu.:643   
 Max.   :178000                                         Max.   :894   
 NA's   :1752                                           NA's   :4599  
  iso_3166.2           region           sub.region        intermediate.region
 Length:42705       Length:42705       Length:42705       Length:42705       
 Class :character   Class :character   Class :character   Class :character   
 Mode  :character   Mode  :character   Mode  :character   Mode  :character   
                                                                             
                                                                             
                                                                             
                                                                             
  region.code   sub.region.code intermediate.region.code
 Min.   :  2    Min.   : 15     Min.   : 5              
 1st Qu.:  2    1st Qu.: 54     1st Qu.:11              
 Median : 19    Median :154     Median :14              
 Mean   : 72    Mean   :178     Mean   :15              
 3rd Qu.:142    3rd Qu.:202     3rd Qu.:17              
 Max.   :150    Max.   :419     Max.   :29              
 NA's   :4599   NA's   :4599    NA's   :26061           

3 Plotly

The above plot shows the relationship between income, life expectancy, and population size across different countries in the year 2015. Each point is a country and the size of the points correlate with the population size of that specific country. The countries are each color coded as well.

The x-axis looks at the income levels for each country. Countries that have higher incomes will be skewed to the right. The y-axis looks at life expectancy. Countries with higher life expectancy will be skewed higher on the y axis. From looking at the plot, we can see that there are some countries that primarily take over the scatter plot as opposed to others depending on population size and income. We can look at whether countries that have higher incomes generally have longer life expectancies or examine the population sizes to see if they correlate with higher or lower income levels.